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Abstract 

This work studies formal utility and privacy guarantees for a simple multiplicative database 
transformation, where the data are compressed by a random linear or affine transformation, 
reducing the number of data records substantially, while preserving the number of original 
input variables. We provide an analysis framework inspired by a recent concept known as 
differential privacy [7]. Our goal is to show that, despite the general difficulty of achieving 
the differential privacy guarantee, it is possible to publish synthetic data that are useful for a 
number of common statistical learning applications. This includes high dimensional sparse 
regression [23], principal component analysis (PCA), and other statistical measures 1 15] based 
on the covariance of the initial data. 



1 Introduction 

In statistical learning, privacy is increasingly a concern whenever large amounts of confidential data are 
manipulated within or published outside an organization. It is often important to allow researchers to analyze 
data utility without leaking information or compromising the privacy of individual records. In this work, 
we demonstrate that one can preserve utility for a variety of statistical applications while achieving a formal 
definition of privacy. The algorithm we study is a simple random projection by a matrix of independent 
Gaussian random variables that compresses the number of records in the database. Our goal is to preserve 
the privacy of every individual in the database, even if the number of records in the database is very large. In 
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particular, we show how this randomized procedure can achieve a form of "differential privacy" |0], while at 
the same time showing that the compressed data can be used for Principal Component Analysis (PCA) and 
other operations that rely on the accuracy of the empirical covariance matrix computed via the compressed 
data, compared to its population or the uncompressed correspondents. Toward this goal, we also study 
"distributional privacy" which is more natural for many statistical inference tasks. 

More specifically, the data are represented as a n x p matrix X. Each of the p columns is an attribute, 
and each of the n rows is the vector of attributes for an individual record. The data are compressed by a 
random linear transformation X \-> X = (£>X, where O is a random m x n matrix with m <3C n. It is also 
natural to consider a random affine transformation X h-> Af = cDX + A, where A is a random mx p matrix, 



as considered in 112311 for privacy analysis, the latter of which is beyond the scope of this paper and intended 
as future work. Such transformations have been called "matrix masking" in the privacy literature [6]. The 
entries of O are taken to be independent Gaussian random variables, but other distributions are possible. 
The resulting compressed data can then be made available for statistical analyses; that is, we think of X as 
"public," while <1> and A are private and only needed at the time of compression. However, even if <$> were 
revealed, recovering X from X requires solving a highly underdetermined linear system and comes with 



information theoretic privacy guarantees, as demonstrated in 112311 - 

Informally, differential privacy [7] limits the increase in the information that can be learned when any 
single entry is changed in the database. This limit implies [16] that allowing one's data to be included in 
the database is in some sense incentive-compatible. Differential privacy imposes a compelling and clear 
requirement, that when running a privacy-preserving algorithm on two neighboring databases that differ in 
only one entry, the probability of any possible outcome of the algorithm should be nearly (multiplicatively) 
equal. Many existing results in differential privacy use additive output perturbations by adding a small 
amount of random noise to the released information according to the sensitivity of the query function / on 
data X. In this work, we focus on a class T of Lipschitz functions that are bounded, up to a constant L, by 
the differences between two covariance matrices, (for example, for E = ^— ^ and its compressed realization 
V = g i V en O), 



f : \f(A)-f(D)\ <L\\A-D\ 



(1) 



where A, D are positive definite matrices and ||-|| is understood to be any matrix norm (for example, PCA 
depends on ||E — Hence we focus on releasing a multiplicative form of perturbation of the input 

data, such that for a particular type of functions as in (Q}, we achieve both utility and privacy. Due to the 
space limits, we only explore PCA in this paper. 

We emphasize that although one could potentially release a version of the covariance matrix to preserve 
data privacy while performing PCA and functions as in (Q~|), releasing the compressed data OX is more 
informative than releasing the perturbed covariance matrix (or other summaries) alone. For example, Zhou et 
al. 1230 demonstrated the utility of this random linear transformation by analyzing the asymptotic properties 
of a statistical estimator under random projection in the high dimensional setting for n <5C p. They showed 
that the relevant linear predictors can be learned from the compressed data almost as well as they could be 
from the original uncompressed data. Moreover, the actual predictions based on new examples are almost 
as accurate as they would be had the original data been made available. Finally, it is possible to release the 
compressed data plus some other features of the data to yield more information, although this is beyond the 
scope of the current paper. We note that in order to guarantee differential privacy, p < n is required. 

In the context of guarding privacy over a set of databases S n — [X\, X%, . . .}, where Z ; - = X T - Xj /n,MXj. 
we introduce an additional parameter in our privacy definition, A max (5„), which is an upper bound on pair- 
wise distances between any two databases X\, X 2 e S„ (differing in any number of rows), according to a 
certain distance measure. In some sense, this parametrized approach of tuning the magnitude of the distance 
measure A max (S„) is the key idea we elaborate in Section|3] 
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Toward these goals, we develop key ideas in Section[4j that include measure space truncation and renor- 
malization for each measure /V,Vj with Law £ (•|^;) ~ N(0, these ideas are essential in order to 
guarantee differential privacy, which requires that even for very rare events, | In P^, ,(£)/^V (£)| remains 
small V/, j. We show that such rare events, when they happen not to be useful for the utilities that we ex- 
plore, can be cut out entirely from the output space by simply discarding such outputs and regenerating a 
new X. In this way, we provide a differential privacy guarantee by avoiding the comparisons made on these 
rare events. We conjecture that this is a common phenomenon rather than being specific to our analysis 
alone. In some sense, this observation is the inspiration for our distributional privacy definition: over a large 
number n of elements drawn from V, the entire ocean of elements, the tail events are even more rare by 
the Law of Large Numbers, and hence we can safely truncate events whose measure P [£] decreases as n 
increases. 

Related work is summarized in Section ITTTI Section [2] formalizes privacy definitions. Section [3] gives 
more detail of our probability model and summarizes our results on privacy and PCA (with proof in Sec- 
tion [5]). All technical proofs appear in the Appendix. 



1.1 Related Work 

Research on privacy in statistical data analysis has a long history, going back at least to @]. We refer to |@] for 
discussion and further pointers into this literature; recent work includes [20]. Recent approaches to privacy 
include data swapping 01311 . ^-anonymity 12111 . and crypto grap hic approaches (for instance, II18L 11211'). Much 
of the work on data perturbation for privacy (for example, ill[|l4il22ll ) focuses on additive or multiplicative 
perturbation of individual records, which may not preserve similarities or other relationships within the 
database. Prior to 1 23], in U, an information-theoretic quantification of privacy was proposed. 

A body of recent work (for example, O, [Hi) explores the tradeoffs between privacy and 

utility while developing the definitions and theory of differential privacy. The two main techniques used to 
achieve differential privacy to date have been additive perturbation of individual database queries by Laplace 
noise and the "exponential mechanism" Biol . In contrast, we provide a polynomial time non-interactive 
algorithm for guaranteeing differential privacy. Our goal is to show that, despite the general difficulty of 
achieving the differential privacy guarantee, it is possible to do so with an efficient algorithm for a specific 
class of functions. 

The work of flill and i23ll . like the work presented here, both consider low rank random linear trans- 
formations of the data X, and discuss privacy and utility. Liu et al. H 1 511 argue heuristically that random 
projection should preserve utility for data mining procedures that exploit correlations or pairwise distances 
in the data. Their privacy analysis is restricted to observing that recovering X from <&X requires solving an 
under-determined linear system. Zhou et al. 112311 provide information-theoretic privacy guarantees, showing 



that the information rate 



I(X\X) 
np 



— > as n — > 00. Their work casts privacy in terms of the rate of information 



communicated about X through X, maximizing over all distributions on X. Hence their analysis provides 
privacy guarantees in an average sense, whereas in this work we prove differential privacy-style guarantees 
that aim to apply to every participant in the database semantically. 



2 Definitions and preliminaries 

For a database D, let A be a database access mechanism. We present non-interactive database privacy 
mechanisms, meaning that A(D) induces a distribution over sanitized output databases V. We first recall 
the standard differential privacy definition from Dwork |7|]. 

Definition 2.1. (a -Differential Privacy) £27 A randomized function A gives a -differential privacy if 
for all data sets Di and D 2 differing on at most one element, and all S C Range(A), F[A(Di) e S] < 
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e a W>[A(D 2 ) g S] 



We now formalize our notation. 
Notation: Let V be a collection of all records (potentially coming from some underlying distribution) and 
a(V) represent the entire set of input databases with elements drawn from V. Let S n = {Xi, X 2 , . . .} C 
<7(D), where X,- e a (J)), Vz, denote a set of databases, each with n elements drawn from V. Although 
differential privacy is defined with respect to all D, E e <j(D), we constrain the definition of distributional 
privacy to the scope of S n , which becomes clear in Definition 12.41 We let V be the entire set of possible 
output databases. 

Definition 2.2. A privacy algorithm A takes an input database D e a (V) and outputs a probability measure 
Pd on V, where V is allowed to be different from <j(D). Let V denote all probability measures on V . 
Then a privacy algorithm is a map A : a (D) — > V where A(D) = P D , VD e a (D). 

We now define differential privacy for continuous output. We introduce an additional parameter 5 which 
measures how different two databases are according to V below. 

Definition 2.3. Let V(D, E) be the distance between D and E according to a certain metric, which is 
related to the utility we aim to provide. Let d(D, E) denote the number of rows in which D and E differ. 
S-constrained a -Differential ((a, S) -Differential Privacy) requires the following condition, 

sup A(P D ,P E )<e a , (2) 

D,E:d(D,E)=\,V(D,E)<S 

where A(P, Q) = esssup D£X) , f^(D) denotes the essential supremum over V for the Radon-Nikodym 
derivative dP/d Q. 

Let S n = {X^ X 2 , . . .} be a set of databases of n records. Let A max (<S„) bound the pairwise distance 
between X,-, Xj e <S„, Vi, j. We now introduce a notion of distributional privacy, that is similar in spirit to 
that in H). 

Definition 2.4. (Distributional Privacy for Continuous Outcome) An algorithm A satisfies 
(a, S) -distributional privacy on S„, for which a global parameter A max (5„) is specified, if for any two 
databases X\, X 2 e S„ such that each consists of n elements drawn from V, where Xy fl X 2 may not 
be empty, and for all sanitized outputs X e V, 

fx,{X) < e a f X2 (X), VX u X 2 s.t. V(X U X 2 ) < 5 (3) 

where fxj(-) is the density function for the conditional distribution with law C (-\Xj) , V/ given Xj. 

Note that this composes nicely if one is considering databases that differ in multiple rows. In particular, 
randomness in Xj is not directly exploited in the definition as we treat elements in X ; e a (V) as fixed data. 
One could assume that they come from an underlying distribution, e.g., a multivariate Gaussian A^(0, £*), 
and infer the distance between E, and its population correspondent S*. We now show that distributional 
privacy is a stronger concept than differential privacy. 

Theorem 2.5. Given S n , if A satisfies (a, S) -distributional privacy as in Definition \2.4\ for all X j e <S„, then 
A satisfies (a, S) -Differential Privacy as in Definition 12. 31 for all Xj e S. 

Proof. For the same constraint parameter d, if we guarantee that ([3]> is satisfied, for all X,- , Xj e <S„ that 
differ only in a single row such that V (X,, Xj) < S, we have shown the a -differential privacy on S„; clearly, 
this type of guarantee is necessary in order to guarantee a -distributional privacy over all X ; , Xj e S n that 
satisfy the d constraint. □ 
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3 Probability model and summary of results 

Let (X,) represent the matrix corresponding to X, g S„. By default, we use (X,)y e W 7 , Vj — 1, . . . , n, 
and (Xf)j g R", Vj = 1, ... p to denote row vectors and column vectors of matrix (X,) respectively. 
Throughout this paper, we assume that given any X, g S„ , columns are normalized, 

I (Xf)j \\l = n, V/ = 1, . . . , p, \/X t 6 <S„ (4) 

which can be taken as the first step of our sanitization scheme. Given Xj, O mx „ induces a distribution 
over all m x p matrices in R mx P via X = OX,, where ~ Af(0, l/n), Vi, j. Let £(-|X 7 ) denote the 
conditional distribution given Xj and Pj. denote its probability measure, where £y = XjXj/n,\/Xj g <S„. 
Hence A' = (^i , . . . , x m ) T is a Gaussian Ensemble composed of m i.i.d. random vectors with C (x, |X ; ) ~ 
Af(0, £/),Vz = l,...,m. 

Given a set of databases <S„ — {Xi, X 2 , . . .}, we do assume there is a true parameter £* such that 
Ei, E 2 , . . ., where E ; = XjXj/n, are just a sequence of empirical parameters computed from databases 
Xi, X 2 . . . eS„. Define 

A max (S„) := 2 sup max | £;(/,£) - £*(/,£)! • (5) 

Although we do not suppose we know £*, we do compute £,-, V/. Thus A max (<S„) provides an upper bound 
on the perturbations between any two databases X ; , Xj g S„: 

max | £, (/, *) - Ej(^, *)| < A max (S„). (6) 

We now relate two other parameters that measure pairwise distances between elements in <S„ to A max (5„). 
For a symmetric matrix M, 2 min (M), /l max (M) = ||M|| 2 are the smallest and largest eigenvalues respectively 



and the Frobenius norm is given by \\M \\f = J X/ X/ Mfj. 

Proposition 3.1. Subject to normalization as in ((U), w.l.o.g., for any two databases Xi, Xj, let A = E i — X 7 
andT = E" 1 - E^ 1 = £"'(£1 - E^Ef 1 = E^AEf 1 . Suppose max a . |(£i - £;)«| < A max («S n ),Vj 
then 

l|A|| f < pA max (5„) and (7) 
l|A|| F 

||r||f < — — . (8) 

^minCEO^minCE,/') 

Suppose we choose a reference point £1 which can be thought of as an approximation to the true value 

£*. 

Assumption 1: Let l min (l,7 l ) = - — 7^ > C m ; n for some constant C m ; n > 0. Suppose ||r|| 2 — o(l) and 
||A|| 2 = o(l). 

Assumption 1 is crucial in the sense that it guarantees that all matrices in S n stay away from being 
singular (see Lemma [33]). We are now ready to state the first main result. Proof of the theorem appears in 
Section [A] 

Theorem 3.2. Suppose Assumption 1 holds. Assuming that || £1 1| 2 , Ami n (£i) and A m j n (Ei)> VX,- g <S„ are 
a]] in the same order, andm > Q(ln2n/?). Considerthe worst case realization when \\A\\ F = &(p A max (<S„)), 
where A max < 1. 

In order to guard (distributional) privacy for all X, g S n in the sense of Definition \2.4[ it is sufficient if 

A m ax(5„) = o (l/(p 2 y/m]n2np)) . (9) 
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The following lemma is a standard result on existence conditions for £ ^ 1 given E j 1 . It also shows that 
all eigenvalue conditions in Theorem 13.21 indeed hold given Assumption 1. 

Lemma 3.3. Let /Imin(Ei) > 0. Let A = Si — X; and ||A|| 2 < /l m i n (£i). Then l min (Xy) > /l min (Ei) — 
l|A|| 2 . 

Next we use the result by Zwald and Blanchard for PCA as an instance from £T|) to illustrate the tradeoff 
between parameters. Proof of Theorem [33] appears in Section[5] 

Proposition 3.4. ([25]) Let A be a symmetric positive Hilbert-Schmidt operator of Hilbert space TC with 
simple nonzero eigenvalues l\ > 1 2 > ■ ■ ■■ Let D > be an integer such that 1 D > and c5 D = 
\{Xd — k d+i)- Let B g HS(TL) be another symmetric operator such that \\B\\ F < do /2 and A + B is still a 
positive operator. Let P D (A) (resp. P D (A + B)) denote the orthogonal projector onto the subspace spanned 
by the first D eigenvectors A (resp. (A + B) ). Then these satisfy 

|| P (A) _ pD {A + B) y < || B \\f/S[) . (10) 

Subject to measure truncation of at most l/n 2 in each Pg. , VX g S„, as we show in SectionlH we have, 

Theorem 3.5. Suppose Assumption 1 holds. If we allow A max (<S„) = 0(y/log p/n), then we essentially 
perform PCA on the compressed sample covariance matrix X 1 X /m effectively in the sense of Propo- 
sition\33 that is, in the form of flO]) with A = ^ and B = - A, where \B\ F = o(l) for 

m = £l(p 2 \n2np). On the other hand, the databases in S n are private in the sense of Definition \2.4[ so long 
as p 2 = O {^yjn/m /log n) . Hence in the worst case, we require 

p =o(n l/6 /Jln2np^ . 

As a special case, we look at the following example. 

Example 3.6. Let X\ = [x\, . . . , x n } T be a matrix of {—1, l} nxp . A neighboring matrix X 2 is any matrix 
obtained via changing the signs on ip bits, where < r < 1, on anyx,. 

Corollary 3.7. For the Example 13.61 it suffices if p = o(n/logn) 1 ^ 4 , in order to conduct PCA on com- 
pressed data, (subject to measure truncation of at most 1 /n 2 in each P^ j , VX g S n ,) effectively in the sense 
of Proposition 13.41 while preserve the a -differential privacy for a = o(l). 

4 Distributional privacy with bounded A max (<S„) 

In this section, we show how we can modify the output events X to effectively hide some large-tail events. 
We make it clear how these tail events are connected to a particular type of utility. Given X, , let X = OX, = 
(jci, . . . , x m ) T . Let f-Ljixj) = exp j — ijcjEr^J /| E ; \ l l 2 (2n) p/2 be the density for Gaussian distribution 
Af(0, £;)• Before modification, the density function is 

m 

h,(X) = l\h l (x j ). (11) 

7 = 1 

We focus on defining two procedures that lead to both distributional and differential types of privacy. Indeed, 
the proof of Theorem [476] applies to both, as the distance metric V (Xi, X 2 ) does not specify how many rows 
Xi and X 2 differ in. We use A max as a shorthand for A max (5„) when it is clear. 
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Procedure 4.1. (Truncation of the Tail for Random Vectors in W ) We require O to be an 
independent random draw each time we generate a X for compression (or when we apply it to the same 
dataset for handling a truncation event). W.l.o.g, we choose to be a reference point. Now we only 
examine output databases X e R mx P such that for C = y/2(d + C 2 ), where C\ » 2.5 and C 2 « 7.7, 



max \(X T X/m)j k — Si 0', ^)| < Cy/ln 2np/m + A max , (12) 

where A max (<S„) = O {^J\ogn/nj. Algorithmically, one can imagine that for an input X, each time we see 
an output X = OX that does not satisfy our need in the sense of (fl2l . we throw the output database X away, 
and generate a new random draw O' to calculate O'X and repeat until O'X indeed satisfies (fT2l . We also 
note that the adversary neither sees the databases we throw away nor finds out that we did so. 

Given X, e S n , let P 2 , be the probability measure over random outcomes of OX, . Upon truncation, 

Procedure 4.2. (Renormalization) We set f^.{X) = for all X e R mx P belonging to set E, where 
E = 



\(X T X\ 
X : max I J - *Li(j,k) 



, \n2np 

> CJ - + A max 

m 



(13) 



corresponds to the bad events that we truncate from the outcome in Procedure I4.lt We then renormalize the 
density as in (fTTT) on the remaining X that satisfies (fT2l) to obtain: 

f' (X) = ^ ^ . (14) 
JX ' K ' 1 — P Xj [E] 

fk C"^0 fs (--fJfl-Pj: [£"]) 

Remark 4.3. Hence ,, ' , = ,. 1 IV ,,, m 2 r ^ , which changes a(m,S) that we bounded below based on 

original density prior to truncation of £ by a constant in the order ofln(l+e) = 0(e), where e = 0(l/n 2 ). 
Hence we safely ignore this normalization issue given it only changes a(m, 8) by 0(l/n 2 ). 

The following lemma bounds the probability on the events that we truncate in Procedure 14- 1 1 Proof of 
Lemma ?? appears in Section iBl 

Lemma 4.4. According to any individual probability measure which corresponds to the sample space 

II T II 2 

for outcomes of OX, , suppose that the columns of (X, ) have been normalized to have || (X/ )j || . — n, Vi, j — 
1, . . . , p andm > 2(C 1 + C 2 )ki2np, then for E as defined in (Q3]), F Xi [E] < ^. 

As hinted after Definition [23] regarding distributional privacy, we can think of the input data as coming 
from a distribution, such that A max (<S„) in © can be derived with a typical large deviation bound between 
the sample and population covariances. For example, for multivariate Gaussian, 

Lemma 4.5. Suppose (X,) ; ~ N(0, E*),Vj = 1, ...,n for all X, e S n , then A max (5„) = 

o P (yioiw^) . 

We now state the main result of this section. Proof of Theorem 14.61 appears in Section ICl 



Theorem 4.6. Under Assumption 1, let m and || (Xf)j || 2 , V/, j satisfy conditions in Lemma \4~4[ By 
truncating a subset of measure at most l/n 2 from each P £j , in the sense of Procedure \4. H and renormalizing 
the density functions according to Procedure \4.2[ we have 



mp\\A\\ F 

a(m, o) < (15) 

2^min(S/) / l-min(Sl) 



/ /In 2np 2||A|| F ||Si||2 \ 

I C V + A m ax + — — 7^ — pfrr J + o(l) 



1 



when comparing all X, e S n with X\, where \\T\\ F is bounded as as in ^} fori = 2. 

Remark 4.7. While the theorem only states results for comparing -jtjxj, we note VXj, X/ e «S„, 

.MO /*(•) 



,„/„(■) 









In- 



which is simply a sum of terms as bounded as in (fl5l) . 



< 




+ 








/l/0 



5 Proof of Theorem 3.5 



Combining the following theorem, which illustrates the tradeoff between the parameters n, p and m for 
PCA, with Theorem 13.21 we obtain Theorem [ 



Theorem 5.1. For a database X e <S„, let A, A + B be the original and compressed sample covariance 
matrices respectively: A = — - and B = — 7L t where X is generated via Procedure 14. 1 1 By 

requiring thatm = Q.(p 2 \n2np), we can achieve meaningful bounds in the form of (fTOl) . 

Proof. We know that A and A + B are both positive definite, and B is symmetric. We first obtain a bound 
on \\B\\ F = y2Xi ZjLi B ?j < P*> where 

r :— maxB j k = max\(X T X / m) jk — Aj k \ 



Jk 



Jk 



< mzz\(X T X/m) jk - XiO*,*)| + \2i(j,k)- A Jk \ 

jk 



< Cj\n 2np/m + 2A max (5„), 

by (fT2l . (O, and the triangle inequality, for X = <J>X. The theorem follows by Proposition 13.41 given that 
\\B\\ F = o(l) form = Q0 2 ln2?ip). □ 
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A Proof of Theorem 



3.2 



Proof of Theorem |3.2j First we plug || A \\ F = p A max in ( fT5T ), and we require each term in ( fT5T ) to be o(l); 
hence we require that p A max = o(l) and 

/A max v / mln2^ = o(l) and mp 2 A 2 max = o(l), (16) 

which are all satisfied given Q. Note that (fT6l ) implies that ||A|| 2 = ||A||/r = /?A max = o(l); hence 
conditions in Assumption 1 are satisfied. □ 



B Proof of Lemma 4.4 



Let us first state the following lemma. 

Lemma B.l. (See J23I1 for example) Let x, y g W with \\x\\ 2 , \\y\\ 2 < 1- Assume that <1> is an m x n 



random matrix with independent N(0, l/n) entries (independent ofx, y). Then for all t > 

r 2 



LI m 



(<tx, Oy) - (x,y) 



> T 



< 2exp 



(-mr \ 
Cy + C 2 x) 



(17) 



with Ci = 4j » 2.5044 and C 2 = V87 % 7.6885. 



Proof 'of Lemma l4~4l Let X,j g M" denote the j th , Vj = l,...,p, column in a n x p matrix VX,- g S n , 
W.l.o.g., we focus on P £i . for i = 1,2. We first note that by the triangle inequality, for Xi, X 2 g <S„, 



for X = OXj 



•If—) 



-((oxo 7 -, (oxo fc ) -_(x l7 -,x u ; 



for Af = <DX 2 
1 



\( XTX \ 

' V m hi 



< 



{(®X 2 )j,(®X 2 ) k ) --{X 2j ,X 2k ) 
m n 



+ max A J , 



where max/ k | AjA < A max (5„) by definition. 

For each P^., we let £ represents union of the following events, where r = 2( - c ' +( ^ ln2 " /; ' , Bj,k 
[\, . . . , p], s.t. 

l -{(^X i ) j ,(OX i ) k )--{(X i )j,(X i ) k ) 



in 



> x. 



It is obvious that if £ c holds, we immediately have the inequality holds in the lemma for all P £j . Thus we 
only need to show that 



X T X- 

sup P Sj [£] < \/n 2 , where E* = 



(18) 



XjeT) n 

We first bound the probability of a single event counted in £, which is invariant across all P Si and we 

X X y 

thus do not differentiate. Consider two column vectors x — -k, y — -4= e I" in matrix -7=, we have 
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||x|| 2 = 1, || y || 2 = 1. Hence for r < 1, by Lemma iBTl 



-{OXi,OXj) - -(Xi,Xj] 
m n 



> x 



= F\\-{Ox,Q>y) - (x,y)\ > v] 
LI m I J 

(—mi 2 \ ( mx 2 \ 

— — — < 2exp - . 
C l + C 2 xJ \ C { + C 2 J 



We can now bound the probability that any such large-deviation event happens. Recall that p is the to 
tal number of columns of X, hence the total number of events in 8 is p(p 2 +1) . Now by taking x = 

2(C I+ C 2 )ln2np < j where m > + Q ^ ^ ^ ^ for ^ ^ 



pip + 1) n i i 

D z,-[f] < Fi, -(ox,-,ox 7 )--(x f ,x 7 -; 

2 |_|m « 

/ mr 2 \ 1 
< P<J> + l)expl - 1 < — 

V Ci+C 2 / « z 



> T 



This implies dT8l) and hence the lemma holds. □ 



C Proof of Theorem 4.6 



W.l.o.g., we compare X 2 and Sj. We first focus on bound ln|E 2 | — ln|Si|. The following proposition 
comes from existing result. 



Proposition C.l. ([24]) Suppose \\T\\ 2 = o(l) and || A|| 2 = o(l). Then for 0, = If 1 , 

ln|S 2 | -ln|Ii| =ln|0!| -ln|0 2 | = A-tr(rZi), 

where 

A = vecT r • (1 - o)(0! + oT)" 1 <g> (0! + or)" 1 ^ • vecT. 



The lower bound on A in Lemma IC2l comes from existing result [24]. We include the proof here for 
showing that the spectrum of integral term in A is lower and upper bounded by that of S 1 squared, up to 
some small multiplicative constants. 

Lemma C.2. ([24]) Let 0i = Ij -1 and 2 = E^ 1 , and hence 2 = ©i + T. Under Assumption 1, 



'HI2 



ini 2 < 



Al 



'HI2 



(2 2 )yl m i n (Ei) 



= o(l), 



Then 



iriil^EO 2 



2(i + A min (i 1 )||rn 2 r 

We are now ready to prove the theorem. 



< A < 



irn 2 f jigiiii 

2-o(l) 
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Proof of Theorem 14.61 Let us consider the product measure that is defined by ($,,). The ratio of the 
two original density functions (prior to normalization) for X = (x\, . . . , x m ) T e W ixp is: 

mx) nr=i 

is 2 r /2 



I Si i m / 2 



exp 



= exp 

Hence by Proposition IC . 1 1 and Lemma IC2l we have that 
a(m,S) < 



■ami 



m m m 

-ln|X 2 |--ln|Z 1 | + - 



m m m ( ( X' X\\\ 



mA m , 

— --<r(rx l) + 



m 
— i 
2 



Hence for X that satisfies (fT2l . ignoring renormalization, we have for X\, X 2 , 

a(m,S) < 

I / X T X\ 
max I I 

j,k \ m J 

{ 



m 

— || veer ||j max 



\xv2nn 

+ A max | + 



) 





mA 


+ 




2 


H 


Pil 



where 



< 



HAIL 



2(1-0(1)) 
II A||* 



(^2)^min(^l) 

The theorem holds given (fT4l in Procedure 14. 2[ Remark 1431 and Lemma l4~4l □ 

We now show the proof for Lemma IC2l Proof of Proposition [Cj] appears in B24! 
Proof of Lemma IC2l After factoring out || vecT ||| = ||r|| F , A becomes 

A™™ (1 - o)(©i + OH" 1 ® (0! + vT)- [ dv 



> 



(1- o)2^(0i + vT)- l dv 

■l 



ue[0,l] 



(1 — v)dv 



> \ M JL&i + vTr^i inf 

2 ue[0,l] 



1 



> inf 



1 



2 «e[o,i] ||Q 1 + vT\\ 1 2 
1 



D6[O,l]2(||0 1 || 2 + O 
Amin(^l) 2 

2(1 +I~(so ||rn 2 ) 2 



>) 2 2(||0 1 || 2 + 



o 2 
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where ( fT9l is due to the fact that the set of p 2 eigenvalues of B(v) <8> B(v), where B(v) = (@i + vT)~\ 
Vo e [0, 1], is {li(B(o))lj(B(v)), Vi, j = 1, . . . , /?}, which are all positive given that (@i + vT) y 0, 
hence (&[ + vT)~ [ y 0, Vo e [0, 1] as shown above. Similarly, 



^maxl / (l-o)(01+«r)- 1 ®(0 1 + vT)- l dv 



a 

< / (1 - 0)1^(01 + vT)- l dv 







-1 

< sup 2^(0! + «r)- 1 / (l-oyo 

B6[0,l] JO 

1 

< sup 



oe[0,l] 2A^ in (0! + or)' 



where Vo e [0, 1], 



^min(01 + »r) > /Un(0l) - V \\T\\ 2 = — — > 

II z>l\\2 

where so long as || A|| 2 = o(l) and || Si || 2 and /l min (Zi) are within constant order of each other, we have 

1A|| 2 ||Si|| 2 



<ih lir|| 2 < . " "V I : = o{\). 

>Vin ^ 2 M mini ^ U 



Hence Vo e [0, 1], 



1 IIS 



1112 11^1112 



2min(0i + or) " l-ulMjIirila - l-HIxHa 

and correspondingly 



sup 



1 ^ II ll| 



D . [ o,i]2^(0 1 + O r) 2(i-||s 1 || 2 ||rn 2 ) i 

□ 



D Example with binary data matrix 

We now show how we can achieve differential privacy in the setting where X\, X 2 e W p such that they 
differ in a single row. We also define the some special case of this general setting. We further illustrate 

the idea that one can not allow differential privacy without giving certain constraints on X\, X 2 , As a 

corollary of Theorem 14.61 we consider the following example. 

Proposition D.l. For the Binary Game in Example \3.(A we have for all z < 1 and j el', 

2pJx(\ — t) p 

l|A|| 2 < ||A|| F < -i^-i < £ 

n n 

Proof. For the special case that X\ and X 2 differ only in a single row after normalization, such that x e Xi 
and y e X 2 , we have A = Ei - E 2 = xx ~ yy ■ First note ||A|| 2 < ||A|| f . In order to bound \\K\\ F let us 
define 



B = xx T — yy T , C = (x - y)x T , D = y(x - y) T , (19) 

where B = C + D are all p x p matrices. A careful counting of non-zero elements in C + D gives 
II A || 2 < || A \\ F - 2^s/t(\ - r) < f for r < 1 /2. Note that when t > 1/2, the effect on \\A\\ F is the same 
as flipping 1 — r bits, hence it is maximized when r = 1/2. □ 
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Theorem D.2. In the binary game in \3.6\ by truncating a subset of measure at most 1 fn , we have 

mp 2 (Cyin 2np/m + 0(l/n)) 



a(m, t) < 



"^min(S2)^imn(2l) 



= o(l) 



for /? = o(*/n/m) andm > Cl(ln2np). 

Proof. Note that for a binary game, A max = max /it < -. As shown in Proposition lD.il 

P 



IAIU < -. 



Plugging the above inequalities in ( fT5T ), we have 



a(m, r) < 



W/ t-min(^2)^min(^l) 



Xxvlnp 2 
+ - + 



2 || Xi ||| 



HZ 



« «l min (S 2 )^min(Xl) 



) 



+ 0(1) 



where for p = o(^Jn/m) and for m > Q(ln2«/j), we have a(m, r) = o(l). □ 
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